|
Relational division is an operator of the relational algebra that
realizes universal quantifications in queries against a relational
database. Expressing a universal quantification problem in SQL is
cumbersome. If the division operator would have a counterpart in a
query language, a more intuitive formulation of universal
quantification problems would be possible. Although division is a
derived operator--it can be expressed using other basic algebra
operators--the performance of queries involving a division problem
can be significantly increased if it is implemented as a separate
operator in a query processor.
In this work, we study relational division as well as operators
related to it like set containment join. We present a classification
of division algorithms and define algebraic laws to enable query
optimization for queries involving division. A natural
generalization of the division operator, called set containment
division, is proposed and algorithms that realize it are discussed,
including an algorithm that uses a new in-memory index data
structure. Furthermore, we study strategies for parallelizing set
containment division algorithms.
The most popular data mining sub-task called frequent itemset
discovery, where one tries to find the number of transactions that
contain a given set of items, is an excellent example of a problem
that requires efficient set containment tests. In this work, we
focus on frequent itemset discovery algorithms that exploit the
query processing capabilities of a relational database system. After
reviewing the best algorithms based on SQL-92, we suggest a new
approach, called Quiver, that can be executed with the help of set
containment division.
We report on performance results that cover experiments with
frequent itemset discovery algorithms running on commercial database
systems as well as experiments that highlight the most important
properties of set containment test algorithms using a query
processor prototype implemented in Java.
|